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(57) ABSTRACT 

A signal-to-noise ratio dependent adaptive spectral subtrac- 
tion process eliminates noise from noise-corrupted speech 
signals. The process first pre-emphasizes the frequency 
components of the input sound signal which contain the 
consonant information in human speech. Next, a signal-to- 
noise ratio is determined and a spectral subtraction propor- 
tion adjusted appropriately. After spectral subtraction, low 
amplitude signals can be squelched. A single microphone is 
used to obtain both the noise-corrupted speech and the 
average noise estimate. This is done by determining if the 
frame of data being sampled is a voiced or unvoiced frame. 
During unvoiced frames an estimate of the noise is obtained. 
A running average of the noise is used to approximate the 
expected value of the noise. Spectral subtraction may be 
performed on a composite noise-corrupted signal, or upon 
individual sub-bands of the noise-corrupted signal. Pre- 
averaging of the input signal’s magnitude spectrum over 
multiple time frames may be performed to reduce musical 
noise. 

31 Claims, 5 Drawing Sheets 
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COMMUNICATION SYSTEM WITH 
ADAPTIVE NOISE SUPPRESSION 

RELATED APPLICATION 

5 

This application is a continuation-in-part and claims pri- 
ority to U.S. patent application Ser. No. 09/163,794 filed 
Sep. 30, 1998 now abandoned and titled “Communication 
System with Adaptive Noise Suppression” which claims 
priority to U.S. Provisional Patent Application Ser. No. to 
60/092,153 filed Jul. 9, 1998 and titled “Communication 
System with Adaptive Noise Suppression,” both applica- 
tions of which are commonly assigned and the entire con- 
tents of which are incorporated herein by reference. 

ORIGIN OF THE INVENTION 

The invention described herein was made in the perfor- 
mance of work under a NASA contract and by an employee 
of the United States Government and is subject to the -, 0 
provisions of the Public Law 96-517 (35 U.S.C. §202) and 
may be manufactured and used by or for the Government for 
governmental purposes without the payment of any royalties 
thereon or therefore. 

TECHNICAL FIELD OF THE INVENTION 25 

The present invention relates generally to communication 
systems and in particular the present invention relates to an 
adaptive noise suppression in processing voice communica- n ) 
tions. 

BACKGROUND OF THE INVENTION 

Voice communication systems are susceptible to non- 
speech noise. One source of such noise can be environmen- 35 
tal factors, such as transportation vehicles. This noise typi- 
cally enters the communication system through a 
microphone used to receive voice sound. To improve the 
quality of the speech communication, efforts have been 
made to eliminate the undesired noise. 40 

One type of noise suppression which uses band pass filters 
to remove noise at specific frequencies is described in U.S. 
Pat. No. 5,432,859 entitled “Noise-Reduction System” 
issued Jul. 11, 1995 to Yang et al. A system which reduces 
noise using spectral subtraction is described in U.S. Pat. No. 45 
5,610,991 entitled “Noise reduction System and Device, and 
a Mobile Radio Station” issued Mar. 11, 1997 to Janse. 
Further, a system which used power spectral subtraction is 
described in U.S. Pat. No. 5,668,927 entitled “Method for 
Reducing Noise in Speech Signals by Adaptively Control- 50 
ling a Maximum Likelihood Filter for Calculating Speech 
Components” issued Sep. 16, 1997 to Chan et al. 

These noise suppression systems do not provide for 
amplification of specific frequencies of the voice signals 
prior to performing an adaptive noise suppression operation. 55 
For the reasons stated above, and for other reasons stated 
below that will become apparent to those skilled in the art 
upon reading and understanding the present specification, 
there is a need in the art for alternative noise suppression 
communication systems. 60 

SUMMARY 

The above mentioned problems with co mmu nication 
equipment and other problems are addressed by the present 65 
invention and will be understood by reading and studying 
the following specification. 


2 

In one embodiment, the present invention describes a 
voice communication system comprised of a microphone for 
receiving input sound signals and a processor for suppress- 
ing noise signals received with the input sound signals. The 
processor first pre-emphasizes the frequency components of 
the input sound signal which contain the consonant infor- 
mation in human speech. Next, the processor determines and 
updates an input sound signal -to-noise ratio. Using this ratio, 
it performs an adaptive spectral subtraction operation to 
subtract the noise signals from the input sound signals to 
provide output signals which are an estimate of voice signals 
provided in the input sound signals. A second filtering 
operation is performed for attenuating the portion of the 
output signals which contains musical noise. A squelching 
operation is then performed in the time domain to further 
eliminate musical noise. An analog-to-digital converter with 
an anti-aliasing filter is used to convert the input sound 
signals to digital signals for input to the processor, and a 
digital-to-analog converter with smoothing filter is provided 
to convert the output signals to analog signals for commu- 
nication to a listener. 

In another embodiment, a voice communication system 
comprises a microphone for receiving input sound signals, 
and a processor for suppressing noise signals received with 
the input sound signals. The processor pre-emphasizes fre- 
quency components of the input sound signals which contain 
consonant information in human speech. The processor also 
determines and updates an input sound signal-to-noise signal 
ratio, and performs an adaptive spectral subtraction opera- 
tion using the input sound signal-to-noise signal ratio to 
subtract the noise signals from the input sound signals to 
provide output signals which are an estimate of voice signals 
provided in the input sound signals. A filter is provided for 
attenuating the portion of the output signals which contains 
musical noise. The voice communication system further 
comprises an analog-to-digital converter for converting the 
amplified input sound signals to digital signals for input to 
the processor, and digital-to-analog converter for converting 
the output signals to analog signals for communication to a 
listener. 

In a further embodiment, a method of reducing noise in a 
communication system is provided. The method comprises 
receiving an input signal containing noise signals and speech 
signals, amplifying a portion of the input signal containing 
consonant information in the speech signals, spectrally 
subtracting an estimated noise signal from a magnitude of 
the input signal to provide a noise reduced signal, and 
attenuating a portion of the noise reduced signal containing 
voice signals to provide an output signal. 

In a still further embodiment, a method of reducing noise 
in a communication system is provided. The method 
includes determining an average magnitude of a noise 
spectrum while speech is not preset on an input sound signal, 
wherein the average magnitude is determined for each of a 
plurality of frequency sub-bands of the noise spectrum. The 
method further includes determining a maximum ratio of 
noise to average noise over each sub-band and determining 
a running average of the maximum ratio of noise to average 
noise over each sub-band. The method still further includes 
receiving an indication that speech may be present on the 
input sound signal and, for each of a plurality of frames 
while receiving the indication that speech may be present on 
the input sound signal, detecting whether speech is present. 
While speech is detected, the method includes estimating a 
speech signal by subtracting from each sub-band the average 
noise for that sub-band multiplied by the lesser of the 
average magnitude of the noise spectrum for that sub-band 
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and the running average of the maximum ratio of noise to 
average noise for that sub-band. While speech is not 
detected, the method includes estimating the speech signal 
to be zero. 

The invention further includes methods and apparatus of 5 
varying scope. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of an adaptive noise suppression io 
system in accordance with an embodiment of the invention. 

FIG. 2 illustrates a flow diagram of an adaptive spectral 
subtraction processor in accordance with an embodiment of 
the invention. 

FIGS. 3 a and 3 b are vector representations of signal 15 
components in accordance with one embodiment of the 
invention. 

FIGS. 4 a and 4b illustrate signal processing using win- 
dowing, zero padding, and recombination in accordance 
with one embodiment of the invention. 20 

DETAILED DESCRIPTION OF THE 
INVENTION 

In the following detailed description of the preferred 25 
embodiments, reference is made to the accompanying draw- 
ings that form a part hereof, and in which is shown by way 
of illustration specific preferred embodiments in which the 
inventions may be practiced. These embodiments are 
described in sufficient detail to enable those skilled in the art 30 
to practice the invention, and it is to be understood that other 
embodiments may be utilized and that logical, mechanical 
and electrical changes may be made without departing from 
the spirit and scope of the present invention. The following 
detailed description is, therefore, not to be taken in a limiting 35 
sense, and the scope of the present invention is defined only 
by the appended claims and equivalents thereof. 

As described above, it is desired to incorporate adaptive 
noise suppression into communication equipment. In par- 
ticular, speech communication equipment provided on trans- 40 
portation equipment susceptible to high levels of noise, such 
as an Emergency Egress Vehicle and a Crawler-Transporter 
used by the National Aeronautics and Space Administration 
(NASA). The Emergency Egress Vehicle is generally a 
military tank used to evacuate astronauts during an emer- 45 
gency, while the Crawler- Transporter is used to move a 
space shuttle to its launch site. In the case of the Emergency 
Egress Vehicle, people are fixed relative to the primary noise 
source, and the spectral content of the noise source changes 
as a function of the speed of the vehicle and its engine. In 50 
the case of the Crawler- Transporter, people can move rela- 
tive to the Crawler-Transporter. Thus, the noise a person 
hears varies with their location relative to the Crawler- 
Transporter. Further, the operation of a hydraulic leveling 
device provided on the Crawler-Transporter changes the 55 
noise level experienced. It will be appreciated that the 
present communication system can be used in numerous 
applications, including but not limited to commercial deliv- 
ery environments, aircraft communication, automobile rac- 
ing, and military vehicles. 60 

Due to the varying nature of the noise in these environ- 
ments, an adaptive algorithm is provided to remove noise. 
Because the noise frequencies produced by most of the 
transportation applications are in the voice band range, 
standard filtering techniques will not work. A signal-to-noise 65 
ratio dependent adaptive spectral subtraction algorithm is 
described herein which eliminates the noise. 
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A block diagram of an adaptive noise suppression system 
100 is shown in FIG. 1. The system includes a microphone 
102 for receiving voice and environmental noise signals. In 
one embodiment, a microphone is used which has noise 
suppression of a mechanical nature, and which provides 
approximately 15 dB of noise suppression. This suppression 
level is sufficient to provide a signal-to-noise ratio favorable 
for spectral subtraction. The system includes an amplifying 
filter 106 for proper signal level and anti-aliasing, an analog- 
to-digital converter 108, an adaptive digital signal processor 
(DSP) 110, a digital-to-analog converter 112, and a smooth- 
ing filter. 

In operation, noise or noise-corrupted speech enters the 
microphone. A high gain amplifier 104 is provided to 
amplify the voice signal up to a ±2.5 Volt range for pro- 
cessing by the Analog-to-digital (A/D) converter. The ampli- 
fication level, therefore, is dependent upon the A/D con- 
verter used. Before entering the A/D converter, the amplified 
signal passes through an anti-aliasing low-pass filter. In one 
embodiment, the filter has a 3 dB attenuation at 3 KHz, and 
a 30 dB attenuation at 5.9 KHz. The filtered signal is then 
sampled by the A/D converter. In one embodiment, the A/D 
converter uses a 1 2-bit resolution and a 12.05 KHz sampling 
rate. The digitized signal is then processed by the DSP. The 
digital signal processor performs pre-emphasis filtering and 
noise suppression using signal-to-noise ratio dependent 
adaptive spectral subtraction, described in detail below. The 
processor first pre-emphasizes the frequency components of 
the input sound signal which contain the consonant infor- 
mation in human speech. By emphasizing this signal region, 
the noise suppression of the system is enhanced. 

The system pre-emphasizes (amplifies) higher frequency 
components of received sound, including the noise and 
voice components in accordance with the power character- 
istics of human speech. Even though most of the energy of 
speech is contained in the lower frequency range (about 300 
to 1000 Hz), amplifying upper frequencies of above about 
1000 Hz amplifies more consonant speech information. In 
one embodiment, therefore, the amplification upper range is 
about 1000 to the sample frequency divided by two. The 
pre-emphasis is performed prior to spectral subtraction to 
give the higher frequency components more importance 
during spectral subtraction. Thus, the intelligibility of 
speech is improved during the subtraction process. The 
resulting output signals are then de-emphasized (attenuated) 
to reduce the effect of musical noise. An optional squelching 
operation is then performed in the time domain to further 
eliminate musical noise. 

After the noise is removed, the digital signal is converted 
back to an analog signal in a digital-to-analog converter 
(D/A). Again in one embodiment, the D/A converter oper- 
ates at a rate of 12.05 KHz. 

The analog signal is then processed through a smoothing 
filter. In one embodiment, a low-pass Bessel filter with a 3 
dB frequency of 3 Hz is used. This filter can be replaced with 
a voice band filter, which is a band-pass filter with low and 
high 3 dB passband frequencies of 300 and 3 KHz, respec- 
tively. If the voice band filter does not have good damping 
characteristics, the smoothing filter is necessary to eliminate 
transients produced from step discontinuities resulting from 
the D/A conversion. After the voice band filter, the signal is 
modulated and transmitted by a communication device. A 
detailed description of the DSP is provided in the following 
section. 
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ADAPTIVE DIGITAL SIGNAL PROCESSOR 

A flow diagram of an adaptive spectral subtraction pro- 
cessor which is signal-to-noise ratio dependent is shown in 
FIG. 2. Before providing a detailed description of the signal 5 
processor implementation, a description of the spectral sub- 
traction algorithm is provided. 

The additive noise model used for spectral subtraction 
assumes that noise-corrupted speech is composed of speech 
plus additive noise. Noise-corrupted speech, x(t), is defined to 
by: 

x(t)=s(i)+n(t), 

where s(t) is speech, and n(t) is noise. In a basic manner, to 
solve for the speech, the noise is subtracted from the 15 
noise-corrupted speech. To focus on the magnitude of noise, 
a Fourier Transform of x(t): 

X<j)=S<j)+N<J) 

is first taken. Because X(f), S(f), and N(f) are complex, they 
can be represented in polar form as: 

\X(f)\^ ex =\S(f)\^ ! +\N(f)\ei e '' (1) 

Solving for the speech: 25 

\S{f)\e! e ’=\X(J)\ei & ‘-\N(J)\ei e '' (2) 

Since the phase of the noise is generally unavailable, the 
phase of the noise-corrupted speech is used to approximate 
the phase of the speech. This is equivalent to assuming the 30 
noise-corrupted speech and the noise are in phase. As a 
result, the speech magnitude is approximated from the 
difference of the noise-corrupted speech magnitude and 
noise magnitude as: 

$(J)=\$<J)\ei ex =(\X{f)\-\N(J)\)d ac (3) 35 

The type of spectral subtraction described above is a 
magnitude spectral subtraction, because the magnitude of 
the noise spectrum at each frequency is subtracted. In its 
most general form, the implemented spectral subtraction 40 
algorithm is written as: 

&(f)={X(f)\ b -a(SNR(f)E[\NIJ]\'^Y' b e' ex (4) 

where E[IN(f)l*] represents the expected value of [IN(f)l*]. 
The exponent b, equals one for magnitude spectral subtrac- 45 
tion and two for power spectral subtraction. The proportion 
of noise subtracted, a, can be variable and signal-to-noise 
ratio dependent. In general a is greater than one, to over 
subtract and reduce distortion caused from using the average 
noise magnitude instead of the actual noise magnitude. The 50 
inverse Fourier Transform yields an estimate of the speech 
as: 

(5) 

The phase approximation used in the speech estimate 55 
produces both magnitude and phase distortion in each fre- 
quency component of the speech estimate. This can be seen 
in FIGS. 3A and 3B by the vector representation of IS(f)e' as l 
and S(f), respectively, for any one frequency. If the magni- 
tude of the noise INI, is small relative to the magnitude of the 60 
corrupted speech, IXI, the distortion caused by using the 
noise-corrupted speech phase 0 T , in place of the noise phase 
is minimal and umioticeable to the human ear. Likewise, if 
the phase of the noise, 0„, is close to the phase of the 
corrupted speech 0 X , the resulting error produced by the 65 
approximation is minimal and umioticeable to the human 
ear. Since the relative phase between 0 T , and 0„ is unknown 


and varies with time and frequency, the ratio between the 
magnitude of the noise-corrupted speech and the noise is 
used as an indication of accuracy. 

An implementation of the spectral subtraction algorithm 
is illustrated in FIG. 2. m/2 Noise corrupted, speech signals 
are first sampled and appended to the previous m/2 samples. 
These 111 samples are then windowed and zero padded. The 
process of appending, windowing, and zero padding of the 
signal is shown in FIG. 4a. Thus, the sampled signal is 
segmented into frames each containing 2 m points. This is 
required since the algorithm uses a Fast Fourier Transform 
(FFT) which assumes that the signal is periodic relative to 
the frames. If a window is not used, spurious frequencies are 
produced due to signal levels at the ends of each frame not 
being equal. As a result of windowing, each frame is 
required to overlap the previous frame in time by 50 percent. 
Appending the previous m/2 samples provides this overlap. 
This allows the two triangular windowed components to add 
to the original signal when recombined. If a window type 
other than a triangular window is used, the addition of 
frames can produce oscillation errors of up to approximately 
9 percent of the original amplitude in the recombined signal. 

Spectral subtraction can be considered as a time varying 
filter which can vary from frame to frame, and is defined by 


S(.f) = |5(/)|^ & = \H(f)\\X(f)\e> e * 

= (Bi/)i-|/v(/)iy & 


= 1 


mn 

' |X(/)I 


( 6 ) 


The filter is obtained from both the corrupted speech and 
noise, and has a length of m points. The length of the time 
domain response of such a filter is 2m- 1. To eliminate the 
effects of circular convolution, therefore, a windowed signal 
of length m is zero padded by 111 points to a total length of 
2 111 points. Since there is a 50 percent overlap in each frame, 
only m/2 points of new input information are obtained. Since 
the response lasts for 2 m points, four output frames which 
overlap in time must be combined to provide m/2 new output 
points to provide the correct output for each frame. This is 
shown in FIG. 4b. 

Once the signal has been windowed and zero padded the 
FFT is taken of the 2 111 points. The resulting magnitude and 
phase of the signal spectrum are determined. The phase is set 
aside for later recombination with the spectral subtracted 
magnitude. The magnitude of the signal spectrum is used to 
determine if the frame contains voice or is voice free. This 
is done by comparing the maximum value of the signal 
magnitude spectrum with a proportion, y, of the maximum 
value of the average noise magnitude spectrum. 

That is, if 

max(L15(A/)l)>Ymax(IN(A/)l) for £ !, . . . , m (7) 

then the frame is considered to be a voice frame. The 
proportion, y, can be initialized by comparing the maximum 
magnitude of a known voice frame to the maximum mag- 
nitude of the average noise. 

The average magnitude spectrum for the noise is obtained 
as follows. When the algorithm is first being initialized an 
initial noise only sequence of frames must be obtained to get 
a baseline on the average magnitude spectrum of the noise. 
For frame one of the initial noise only sequence: 

\N(kf)\=\X(kf)\ for k=l, . . . , m 


( 8 ) 
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for other frames of the initial noise only sequence: 

Mk/) bN(kj) +i ! -bjltTAy')! for k=l, . . . , m (9) 


where 0.7026^0.95. 

Once the initial average noise estimate is obtained from a 
known noise only test sequence, each frame of signal is 
checked for voice using max(IX(kf)l). If the equation related 
to max(IX(kf)l) is not satisfied, the frame is considered 
unvoiced and the equation for the other frames of the initial to 
noise only sequence is used with a predetennined value for 
6 that is in the specified range. In general 6 determines how 
quickly the noise estimate can vary. The technique is simple, 
but works well, since voice frames are generally strong in 
specific frequencies due to excitation of the vocal cords. 15 
After the average noise magnitude spectrum is updated, 
the magnitude spectrum of the signal and the average noise 
magnitude spectrum are used to perform subtraction. The 
signal-to-noise ratio dependent proportion, a is determined ., (| 
using the following equation: 

nfj \NW)\ 

k = 1 

a = 

m 

z \xm 


When the algorithm is first initialized r| is determined by 30 
testing a signal frame that is known to contain voice. r| is 
chosen such that a is approximately 1.78 in the voiced 
frames. Once a is determined spectral subtraction is per- 
formed using: 

35 

\S(kf,\=\Xm-oMffj\ for k=l 2 m (11) 

While the spectral subtraction may be performed on the 
composite input sound signal as demonstrated in this 
embodiment, other embodiments of the invention provide 4Q 
for this spectral subtraction to be performed on sub-bands of 
the composite spectrum of the input sound signal. 

If any of the estimates for IS(kf)l are negative, they are set 
to zero. IS(kf)l is then low-pass filtered to eliminate musical 
noise which is generally high frequency. The lower the 3 dB 45 
frequency of the filter, the more noise and speech eliminated. 
After low-pass filtering, the phase of the noise-corrupted 
speech, 0 T , is combined with the magnitude of the estimate 
of the speech and the inverse FFT is taken. This provides one 
of the four offset output frames that must be combined using 50 
the overlap add method described above. The summing 
provides an averaging effect for reducing phase errors. If 
necessary, a low level signal squelching process, performed 
in the time domain, can be provided. Due to the mechanical 
nature of the human vocal track, speech cannot being 55 
abruptly in one time frame, or in the frames surrounding that 
time frame. Thus, the low level signal squelching process 
removes musical noise artifacts which tend to be high 
frequency and random in nature. 

Hie low level signal squelching processor looks at three 60 
frames of estimated speech: the past, present and future 
frames. Future frame estimates of speech are obtained by 
delaying the speech estimate for one frame before being 
output. Thus, the signal-to-noise ratio dependent spectral 
subtraction algorithm is actually calculating the future out- 65 
put, while the present output is being held in a buffer to 
determine if low level squelching is required, and the past 


(10) 

25 


frame is being output through the D/A. Hie algorithm is 
described by the following equation: 

if LS(A'7ii)l<fimax(lA ? (^7iZ,)l)for k=l, . . . , m/2, and 

i=L-l£,L+l then LS(£2;/)I=0 for k-1, . . . , m/2 (12) 


where p is a user discretion proportion. 

A noise cancellation communication system in accor- 
dance with the foregoing embodiment was tested in an 
emergency egress vehicle used to evacuate astronauts if an 
emergency situation arises during a launch. The noise level 
inside the vehicle is 90 decibels with the engine running and 
120-125 decibels once the vehicle starts moving. As a result, 
it is impossible to hear what the emergency crew is saying 
during a rescue operation. The headsets used by the rescue 
crew had microphones with noise suppression of a mechani- 
cal nature, which provided 15 decibels of noise suppression. 
Furthermore, the frequency response of the microphone 
attenuated frequencies outside of the voice band range of 
300 Hz to 3 kHz. 

Because the noise input by the microphone is directly in 
the range of voice band frequencies, standard filtering tech- 
niques attenuate both noise and speech by the same factor. 
The noise experienced was not constant. In fact, as each 
track of the egress vehicle hit the ground, the reaction force 
caused an impulse on the vehicle which excited its resonant 
frequencies. The signal-to-noise ratio dependent adaptive 
spectral subtraction algorithm was tested on the emergency 
egress vehicle using the following parameter settings, 
m=2.56, y=2.0, 6=0.90, r|=4.0, and p=0.025. The words 
“test, one, two, three, four, five” were spoken into the 
microphone. A signal-to-noise ratio of approximately 15 dB 
existed for the original sampled signal. As mentioned, the 
microphone provided approximately 1 5 dB of noise attenu- 
ation. This provided a favorable signal-to-noise ratio, which 
is required for spectral subtraction to work well. Lowering 
the gain and talking louder also improved the signal-to-noise 
ratio without saturating the voltage limits of the A/D con- 
verter. The spectral subtraction provided approximately 20 
dB of improvement in the signal-to-noise ratio. Listening 
test verified that the noise was virtually eliminated, with 
little or no distortion due to musical noise. 

For a further embodiment of the invention, a frequency 
sub-band based adaptive spectral subtraction algorithm is 
provided. Since the noise and speech have no physical 
dependence, the assumption that the noise and speech are in 
phase at any or all frequencies has no basis. Rather, noise 
and speech can be thought of as two independent random 
processes. The phase difference between them at any fre- 
quency may have an equal probability of being any value 
between zero and 2 jt radians. Thus, the noise and speech 
vectors at one frequency may add with a phase shift while 
simultaneously at a different frequency may subtract with a 
different phase shift. Thus, subtracting an assumed in-phase 
noise signal from the noise-corrupted speech has the same 
probability of reducing the particular frequency component 
of the speech even further as it does of brining it back to its 
proper level. 

Furthermore, such subtraction is generally almost certain 
to cause some distortion in the phase. The amount of error 
produced at each frequency depends upon the relative phase 
shift and the relative magnitudes of the speech and noise 
vectors. For each spectral frequency that the magnitude of 
the speech is much larger than the corresponding magnitude 
of the noise, the error is negligible. For the consonant sounds 
of relatively low magnitude, the error will be larger. This is 
true even if the magnitude of the noise at each frequency 
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could be exactly determined during speech. For the above 
reasons, the smaller the amount of noise that needs to be 
subtracted olf, the less the degradation of the speech. 

For a given range of frequencies, say zero to six kilohertz, 
each speech sound is only composed of some of the fre- 5 
quencies. No typical speech sound is composed of all of the 
frequencies. If the spectrum is divided into frequency sub- 
bands, the frequency sub-bands containing just noise can be 
removed when speech is present. Furthermore, during 
speech the power level of the frequency sub-bands that to 
contain speech will increase by a larger proportion than the 
power level of the entire spectrum. Thus, speech will be 
easier to detect by looking at the sub-band power change 
than by looking at the overall power change. This is espe- 
cially true of the consonant sounds, which are of lower 15 
power, but are concentrated in one or two frequency sub- 
bands. By dividing the signal into frequency sub-bands, 
frequency bands that do not contain useful information can 
be removed so that the noise in those frequency sub-bands 
does not compete with the speech information in the useful 20 
sub -bands. 

As described above, the average magnitude of the noise 
spectrum, lN(f)l, is usually used to approximate the magni- 
tude of the noise spectrum. Since the magnitude of the noise 
spectrum will in general have sharper peaks then the average 25 
magnitude of the noise spectrum, a multiple, g, (which is 
usually greater than one) of the average magnitude of the 
noise spectrum is subtracted from the magnitude of the 
noise-corrupted speech spectrum. This is done to reduce 
“musical-noise” which is caused from the incomplete elimi- 30 
nation of these random peaks in the magnitude of the noise 
spectrum. Unfortunately, this also removes desired speech, 
which reduces intelligibility for the lower amplitude conso- 
nant sounds. A way to reduce the number and size of the 
random peaks in the magnitude of the noise spectrum is to 35 
average the magnitude of the noise-corrupted speech spec- 
trums over time. In general, the magnitude of the noise 
spectrum has peaks that change from time frame to time 
frame in a more random fashion than the magnitude of the 
speech spectrum. Averaging the magnitude of the noise- 40 
corrupted speech spectrum over multiple time frames 
reduces the size and variation in these peaks without notice- 
able degradation to the speech. The reduction in the size and 
variation of these peaks in the magnitude of the noise- 
corrupted speech spectrum allows for a smaller multiple of 45 
the average magnitude noise spectrum to be used to elimi- 
nate them. Since these spectral peaks are the cause of the 
musical noise, removing them eliminates the musical noise. 
Using a smaller proportion of average magnitude of the 
noise spectrum to remove the peaks retains more of the low 50 
amplitude speech. 

The incoming sound signal is low-pass filtered to prevent 
aliasing, sampled, windowed with a hamming window, and 
zero padded to twice its length. As with a triangular window, 
a hamming window tails olf the signal at each end. Each 55 
time frame, L, of the signal overlaps the previous time frame 
by 50 percent. An “m” point Fast Fourier Transform is taken, 
and the magnitude of the spectrum is separated from the 
phase angle. The magnitude of the signal spectrum is 
averaged with the magnitude of the signal spectrum from the 60 
8 m previous and the S m future time frames. The value for b m 
is chosen small enough so as not to degrade the speech 
spectrum, but large enough to smooth the variations in the 
magnitude of the noise spectrum over different time frames. 
The 8 m future time frames are obtained by processing frames 65 
of data and holding the results for 8 m time frames. The phase 
angle will not be altered. The phase angle for time frame L 


will be associated with the averaged magnitude of the signal 
spectrum described above for time frame L. This averaged 
magnitude of the signal spectrum will be used throughout 
the algorithm. 

If the signal is noise-corrupted speech IX z (f)l will be used 
to represent the averaged magnitude of the noise-corrupted 
speech spectrum for time frame L. If the signal just contains 
noise, IN z (f)l will be used to represent the averaged magni- 
tude of the noise spectrum for time frame L. The average 
magnitude of the signal spectrum is partitioned into fre- 
quency sub-bands. One example of the possible choice of 
frequency sub-band is shown in Table 1. The range of 
frequencies in each sub-band is, for one embodiment, cho- 
sen in accordance with the Bark scale (as described in E. 
Zwicker and H. Fasti, Psychoacoustics Facts and Models, 
Springer- Verlag, 1990) to account for the hearing charac- 
teristics of the human ear. Other sub-bands could be used 
with embodiments of the invention. 


TABLE 1 


Example of Possible Frequency Ranges for the Frequency Sub-Bands 

Sub- 

band 

Start 

Bin 

Stop 

Bin 

Number of 
Bins 

Beginning 
Frequency (Hz) 

Ending 

Frequency (Hz) 

1 

1 

8 

8 

0 

399 

2 

9 

10 

2 

400 

509 

3 

11 

13 

3 

510 

629 

4 

14 

16 

3 

630 

769 

5 

17 

20 

4 

770 

919 

6 

21 

24 

4 

920 

1079 

7 

25 

28 

4 

1080 

1269 

8 

29 

33 

5 

1270 

1479 

9 

34 

38 

5 

1470 

1719 

10 

39 

45 

7 

1720 

1999 

11 

46 

52 

7 

2000 

2319 

12 

53 

61 

9 

2320 

2699 

13 

62 

72 

10 

2700 

3149 

14 

73 

84 

11 

3150 

3699 

15 

85 

101 

17 

3700 

4399 

16 

102 

122 

21 

4400 

5299 

17 

123 

128 

6 

5300 

6000 


To key into the communication system, the user is 
required to press and hold a push-to-talk button while 
speaking into the microphone. Thus, it is assumed that 
speech is not present when the push-to-talk is not pressed. 
For each time frame, L, when the push to talk is not pressed, 
the signal is just noise. 

X L (kf)\=N L (kf)\ for frequency bins k=l, . . . , m (13) 

While the push-to-talk is not pressed, the statistics of 
IN 7 (I)I are determined, and the algorithm is initialized. The 
statistics of IN £ (f)l are updated every n A time frames until a 
push-to-talk occurs. n A is chosen large enough to provide 
reliable noise spectrum statistics and small enough to be 
updated before each push-to-talk. The average of IN i (f)l for 
each frequency bin is determined using the sample mean. 


| n A 

N(kf ) = — V \N L (kf )\ for frequency bin k = 1 , . . . , m 
n 4 


( 14 ) 
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The power in frequency sub-band v, based on IX z (f)l, for 
time frame L is 


Plv = £ \x L (kff 

i./j» 


Then, for sub -band v, 


if {[all P„(£, i+8rf)>T rfv P^ v ] or [all -P,.(L-8,,, ■ ■ 

•, L)>ra v P Jv ]or [all P v (L-b c L+SJs-X^J 


< 15 ) 5 set 


where |l v and are the beginning and ending frequency bins 
for sub-band v. The average power in frequency sub-band v 
over the n^ time frames is estimated using the sample mean. 


1 & 

P Av = — 2_j p L V f° r sub- band v = 1, ... ,7] 


A unitless form of the standard deviation of the power in 
frequency sub-band v over the n A time frames is estimated 
using the square root of the sample variance and the sample 
mean of the power. 


, (^T)£ (Mv ) 2 


for sub- band v = 1, 


iU Y*(Yc)=v (28) 

Equations (25) through (28) are repeated for sub-band 
v=l, . . . , r|. In equation (25), the time frame shifts b d and 
8 C required for speech are based upon the minimum time 
15 duration required for most speech sounds (Digital Signal 
Processing Application with the TMS320C30 Evaluation 
Module: Selected Application Notes, literature number 
SPRA021, 1991, p. 62). The time frame shift b d is used to 
detect the beginning and ending of speech sounds. The 
iq frame shift b c detects isolated speech sounds. b c is generally 
half the size of b d . Equation (25) looks into the future (i.e., 
P,,(L, . . . , L+8 rf )) by processing frames of data but holding 
back decisions on them for b d time frames. 

After using equation (25) to check all of the sub-bands, if 
25 [(Yc > l) or (y.r( 1 )>14)], the frame is considered to be a 
speech frame. During speech frames, the ratio of the sum of 
noise-corrupted speech to sum of average noise 


The square root of this value is used as a simple, but 30 
crude, approximation to the standard deviation of the aver- 
age magnitude of the noise spectrum in frequency sub-band 
v over the n A time frames. 

o_vv=vai for sub-band v=l, . . . , r) (18) 

The threshold proportions for speech in each frequency 
sub-band are dependent upon the standard deviation of the 
power in that frequency sub-band and externally adjustable 
proportions a rfl and a d2 . 

40 

x dv =(a dl +a d2 o v ) for sub-band v=l, . . . , rj (19) 

Once an average value for the noise is determined, the 
maximum ratio of noise to average noise over the sub-band 

45 

(|Nl(*/)|) (20) 

MR Lv = max for sub-bands v = 1, . . . , ri 


and the running average of MR iv 

AMR V ={\ -\i)AMR v +{iMR Lv for sub-bands v=l, . . . , 

0 ( 21 ) 55 

are determined 

When the push-to-talk is pressed, the algorithm must 
determine if speech is present during that particular time 
frame. For each time frame L, the noise flags for the 60 
sub-bands y v , the noise flag counter y c , and the noise flag 
record vector y R , are initialized to the following values: 

Y v =l for sub-band v=l , . . . , rj (22) 

Yc =0 (23) 65 


(29) 

£ l*i(*/)l 

k=/S v 

R Lv = for frequency sub- bands v = 1 , . . . , r\ 

I W L (kf)\ 

k=p v 


is updated. Then, the speech estimate is detennined using 

lS L (kf)\=X L (kfl\-mm[R L Ja pR1 +a pIt 20 Nv ), AMR „ 
(a pAl +a pA 20 Nv )](l+a/i v )mkf) for v=l, . . . i) 
and k=| v , . . . , |3 V (30) 

If the magnitude of the estimated speech spectrum is less 
than zero for any frequency, it is set equal to zero. In 
equation (30), the amount of the average noise subtracted is 
weighted by a minimum proportional to R iv or AMR V . R iv 
is large during strong vowel sounds but small during weaker 
consonant sounds. AMR V is the running average of the 
proportion needed to remove all of the noise. Using the 
minimum of these two terms allows the removal of large 
amounts of noise in a particular frequency sub-band when it 
contains relatively strong speech. Furthermore, only small 
amounts of noise are removed from a particular frequency 
sub-band when it contains relatively weak speech. The 
above weights contain the approximation to the standard 
deviation of the noise for the particular frequency sub-band 
a Nv to account for the variation in the noise for that fre- 
quency sub-band. The noise flag y v greatly increases the 
proportion subtracted when speech is not present in a 
frequency sub-band. Equation (30) is designed to essentially 
remove all noise in frequency sub-bands that do not contain 
speech information while preserving as much speech infor- 
mation as possible when removing noise from frequency 
sub-bands that contain speech information. The a’s are 
preset parameters. 

If the time frame is not a speech frame, it is a noise frame. 
During noise frames, 


Yji(l)=0 


(24) 


W I Xkf)\=\X L {kf)\ for frequency bins k=l, . . . , m, 


(31) 
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and the following values are updated. The maximum ratio of 
noise to average noise over each frequency sub-band 


MRi v = max 
k=C er A 


I N L (kf)Y 
\N(kf)\ , 


for frequency sub- bands v = 1, 


(32) 5 


... ,TJ. 


10 

The running average of MRL r 

AMR v =(1-\i)AMR v +/iMR Lv for v=l, . . ., r\. (33) 

The running average of the power 15 

P Av =(l-li)P Av +/iP Lv for frequency sub-bands 

v=l, . . . , Y], (34) 

and the running average of the noise at each frequency 

N(ltf)=(l-ii)N(kf)+y,lN L (kf)l for k=l, . . . , m. (35) 20 

Also, the estimated speech signal is set to zero. 

'S L (kf)\=0 for k=l m (36) 


At this point, the algorithm checks to see if the push-to- 25 
talk is still being pressed. If it is, the process is repeated 
starting at equation (22). If it is not, the algorithm goes back 
to the initialization stage, equation (13), to update the 
statistics of the noise and obtain new threshold proportions. 

If the system does not contain a push-to-talk, the algo- 30 
rithm initializes when first turned on . It then performs as 
described above with the exception that it only returns to 
equation (13) upon reset. 

CONCLUSION 35 

Adaptive noise suppression systems have been described 
for removing noise from voice communication systems. A 
signal-to-noise ratio dependent adaptive spectral subtraction 
algorithm was described herein which eliminates the noise. 40 
For some embodiments, pre-averaging of the input signal’s 
magnitude spectrum over multiple time frames is performed 
to reduce musical noise. Also, sub-band based adaptive 
spectral subtraction is utilized. 

The system includes a microphone, anti-aliasing filter, an 45 
analog-to-digital converter, a digital signal processor (DSP), 
a digital-to-analog converter, and a smoothing filter. The 
DSP pre-emphasizes (amplifies) higher frequency compo- 
nents of received sound, including the noise and voice 
components in accordance with the power characteristics of 50 
human speech. The pre-emphasis is performed prior to 
spectral subtraction to give the higher frequency compo- 
nents more importance during spectral subtraction. Tlius, the 
intelligibility of speech is improved during the subtraction 
process. The resulting ouput signals are then de-emphasized 55 
(attenuated) to reduce the effect of musical noise. Finally, the 
system provides a low level signal squelching process to 
remove musical noise artifacts which tend to be high fre- 
quency and random in nature. 

Although specific embodiments have been illustrated and 60 
described herein, it will be appreciated by those of ordinary 
skill in the art that any arrangement that is calculated to 
achieve the same purpose may be substituted for the specific 
embodiment shown. This application is intended to cover 
any adaptations or variations of the present invention. There- 65 
fore, it is manifestly intended that this invention be limited 
only by the claims and the equivalents thereof. 


What is claimed is: 

1 . A method of reducing noise in a communication 
system, the method comprising: 

averaging an input sound signal’s magnitude spectrum 
over multiple time frames to reduce musical noise; 
determining an average magnitude of a noise spectrum 
while speech is not present on the input sound signal, 
wherein the average magnitude is determined for each 
of a plurality of discrete frequencies of the noise 
spectrum; 

determining a maximum ratio of noise to average noise 
over each of a plurality of sub-bands; 
determining a running average of the maximum ratio of 
noise to average noise over each sub-band; 
receiving an indication that speech may be present on the 
input sound signal; and 

for each of a plurality of frames while receiving the 
indication that speech may be present on the input 
sound signal; 

detecting whether speech is present; 
while speech is detected, estimating a speech signal 
magnitude for each discrete frequency by subtracting 
from the input sound signal magnitude for that 
discrete frequency the average noise for that discrete 
frequency multiplied by the lesser of 

(a) a ratio of a sum of noise-corrupted speech to a 
sum of average noise for the frequency sub-band 
containing that discrete frequency and 

(b) the running average of the maximum ratio of 
noise to average noise for the frequency sub-band 
containing that discrete frequency; and 

while speech is not detected, estimating the speech 
signal magnitude to be zero. 

2 . The method of claim 1 wherein receiving an indication 
that speech may be present further comprises receiving an 
indication that a push-to-talk button has been pressed on a 
microphone. 

3 . The method of claim 1 wherein determining an input 
sound signal magnitude spectrum further comprises: 

low-pass filtering the input sound signal; 
sampling m /2 samples of the input sound signal and 
appending those m /2 samples to a previous m /2 
samples, thereby producing m samples; 
windowing the m samples to produce a windowed signal 
of m points; and 

zero padding the windowed signal of 111 points by m points 
to produce a frame of 2 111 points. 

4 . The method of claim 3 , wherein windowing the m 
samples further comprises windowing the 111 samples using 
a hamming window. 

5 . The method of claim 1 , wherein the sub-bands are 
chosen according to the Bark scale. 

6 . A method of reducing noise in a communication 
system, the method comprising: 

receiving an input sound signal containing noise; 
framing the input sound signal by performing, for each 
frame: 

sampling m /2 samples of the input sound signal and 
appending those m /2 samples to a previous m /2 
samples, thereby producing 111 samples; 
windowing the 111 samples to produce a windowed 
signal of 111 points; 

zero padding the windowed signal of 111 points by 111 
points to produce a frame of 2 111 points; 
determining an average magnitude of the input sound 
signal for each of a plurality of discrete frequencies 
while speech is not present in the input sound signal; 
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dividing the input sound signal spectrum into a plurality 
of frequency sub-bands; 

determining which of the frequency sub-bands contain 
only noise; 

removing by a laiger proportion the frequency sub-bands 5 
containing only noise from the spectrum; and 
estimating a speech signal magnitude for each discrete 
frequency by subtracting from the input sound signal 
magnitude for that discrete frequency the average noise 
for that discrete frequency multiplied by the lesser of 10 

(a) a ratio of a sum of noise-corrupted speech to a sum 
of average noise for the frequency sub-band contain- 
ing that discrete frequency and 

(b) the running average of the maximum ratio of noise 

to average noise for the frequency sub-band contain- 15 
ing that discrete frequency. 

7. The method of claim 6, wherein windowing the m 
samples comprises windowing the m samples using a ham- 
ming window. 

8. The method of claim 6, further comprising performing 
a Fourier transform on the input signal prior to performing 
the spectral subtraction. 

9. The method of claim 6, further comprising performing 

a smoothing operation on the output signal to remove , 5 
transients produced from a digital-to-analog conversion 
operation. 

10. A method of reducing noise in a communication 
system, the method comprising: 

determining an average magnitude of a noise spectrum 30 
while speech is not present on an input sound signal, 
wherein the average magnitude is determined for each 
of a plurality of discrete frequencies of the noise 
spectrum; 

determining a maximum ratio of noise to average noise 35 
over each of a plurality of sub-bands; 
determining a running average of the maximum ratio of 
noise to average noise over each sub-band; 
receiving an indication that speech may be present on the 
input sound signal; and 40 

for each of a plurality of frames while receiving the 
indication that speech may be present on the input 
sound signal, estimating a speech signal magnitude for 
each discrete frequency by subtracting from the input 
sound signal magnitude for that discrete frequency the 
average noise for that discrete frequency multiplied by 
the lesser of 

(a) a ratio of a sum of noise-corrupted speech to a sum 
of average noise for the frequency sub-band contain- 
ing that discrete frequency and 

(b) the running average of the maximum ratio of noise 
to average noise for the frequency sub-band contain- 
ing that discrete frequency. 

11. The method of claim 10, further comprising deter- . . 
mining which of the frequency sub-bands contain only 
noise, and removing by a larger proportion the frequency 
sub-bands containing only noise from the spectrum. 

12. A method of reducing noise in a communication 

system, the method comprising: 60 

designating a plurality of frequency sub-bands for a signal 
spectrum of interest; 

designating a plurality of frequency bins for each of said 
sub-bands; 

during an initialization/update mode, determining, for 65 
each bin, an average magnitude of noise in said system 
over a first set of time frames; 


obtaining, for each sub-band, a noise sum equal to the sum 
of the average noise magnitudes for the bins in the 
sub-band; 

for each of said frames in said first set, 

a) determining the ratio of noise to said average noise 
for each bin; 

b) determining for each sub-band, the maximum ratio 
of noise to said average noise for the bins therein; 

determining a running average of said maximum ratio for 
each sub-band; and 

during a noise reduction mode, for each frame in a second 
set of time frames, 

a) obtaining, for each sub-band, an input signal sum 
equal to the sum of the magnitudes of an input sound 
signal for the bins in the sub-band; 

b) determining the ratio of said input signal sum to said 
noise sum; and 

c) estimating a speech signal magnitude for a given bin 
as a function of 

i) the input sound signal magnitude for the given bin; 

ii) said average noise for the given bin; 

iii) the ratio of said input signal sum to said noise 
sum; and 

iv) said running average. 

13. The method of claim 12, wherein operation in said 
initialization/update mode occurs in response to an indica- 
tion that speech is not present in the input sound signal, and 
wherein operation in said noise reduction mode occurs in 
response to detection of speech. 

14. The method of claim 13, wherein said estimating 
function includes a weighted function of said ratio of said 
input signal sum to said noise sum and said running average. 

15. The method of claim 14, wherein said weighted 
function is a minimum function in which said ratio of said 
input signal sum to said noise stun and said running average 
are weighted and compared. 

16. The method of claim 15, wherein said speech signal 
magnitude estimate is the input sound signal magnitude for 
the given bin minus a value proportional to the product of 
said average noise for the bin and the lesser of the weighted 
values of said ratio of said input signal sum to said noise sum 
and said running average. 

17. The method of claim 16, further comprising deter- 
mining which of the frequency sub-bands contain only 
noise, and removing by a larger proportion the frequency 
sub-bands containing only noise from the spectrum. 

18. A method of reducing noise in a communication 
system, the method comprising: 

designating a plurality of frequency sub-bands for a signal 
spectrum of interest; 

designating a plurality of frequency bins for each of said 
sub-bands; 

during an initialization/update mode, determining, for 
each bin, an average magnitude of noise in said system 
over a first set of time frames; 

obtaining an indication of noise strength for each sub- 
band; 

for each of said frames in said first set, determining a 
noise deviation for each sub-band by 

a) determining the ratio of noise to said average noise 
for each bin; 

b) determining, for the sub-band, the maximum ratio of 
noise to said average noise for the bins therein; and 

during a noise reduction mode, for each frame in a second 
set of time frames in which an input signal is received, 
a) obtaining an indication of input signal strength for 
each sub-band; 
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b) determining a signal-to-noise ratio as the ratio of 
said input signal strength indication to said noise 
strength indication; and 

c) estimating a speech signal magnitude for a given bin 
as a function of 

i) the input sound signal magnitude for the given bin; 

ii) said average noise for the given bin; 

iii) said signal-to-noise ratio; and 

iv) said noise deviation. 

19. the method of claim 18, wherein said estimating 
function includes a weighted function of said signal-to-noise 
ratio and said noise deviation. 

20. The method of claim 19, wherein said weighted 
function is a minimum function in which said signal-to- 
noise ratio and said noise deviation are weighted and com- 
pared. 

21. The method of claim 20, wherein said speech signal 
magnitude estimate is said input sound signal magnitude 
minus a value proportional to the product of said average 
noise and the lesser of the weighted values of said signal- 
to-noise ratio and said noise deviation. 

22. The method of claim 18, wherein the determination of 
said noise deviation includes calculating the running average 
of the maximum ratio of noise to said average noise. 

23. The method of claim 18, wherein said input signal 
strength indication is the sum of the input sound signal 
magnitudes for the bins in the sub-band, and wherein said 
noise strength indication is the sum of the average noise 
magnitudes for the bins in the sub-band. 

24. The method of claim 18, further comprising deter- 
mining which of the frequency sub-bands contain only 
noise, and removing by a larger proportion the frequency 
sub-bands containing only noise from the spectrum. 

25. An adaptive noise suppression device for a voice 
communication system, comprising: 

a signal input; 

a signal output; and 

a noise reduction processor connected between said signal 
input and signal output; 

wherein, during an initialization/update mode, for a plu- 
rality of frequency sub-bands each having a plurality of 
frequency bins, said processor is adapted to 

a) determine, for each bin, an average magnitude of 
noise over a first set of time frames; 

b) obtain an indication of noise strength for each 
sub-band; 
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c) for each of said frames in said first set, determine a 
noise deviation for each sub-band based on the 
maximum ratio of noise to said average noise for 
each bin in the sub-band; and 

5 wherein, during a noise reduction mode, for each frame in 
a second set of time frames in which an input signal is 
received, said processor is adapted to 
a) obtain an indication of input signal strength for each 
sub-band; 

to b) determine a signal-to-noise ratio as the ratio of said 
input signal strength indication to said noise strength 
indication; and 

c) estimate a speech signal magnitude for a given bin as 
a function of 

15 i) the input sound signal magnitude for the given bin; 

ii) said average noise for the given bin; 

iii) said signal-to-noise ratio; and 

iv) said noise deviation. 

26. The noise suppression device of claim 25, wherein 

20 said estimating function includes a weighted function of said 

signal-to-noise ratio and said noise deviation. 

27. The noise suppression device of claim 26, wherein 
said weighed function is a minimum function in which said 
signal-to-noise ratio and said noise deviation are weighted 

25 and compared. 

28. The noise suppression device of claim 27, wherein 
said speech signal magnitude estimate is said input sound 
signal magnitude minus a value proportional to the product 
of said average noise and the lesser of the weighed values of 

30 said signal-to-noise ratio and said noise deviation. 

29. The noise suppression device of claim 25, wherein 
said processor calculates the running average of the maxi- 
mum ratio of noise to said average noise to determine said 
noise deviation. 

35 30. The noise suppression device of claim 25, wherein 

said input signal strength indication is the sum of the input 
sound signal magnitudes for the bins in the sub-band, and 
wherein said noise strength indication is the sum of the 
average noise magnitudes for the bins in the sub-band. 

40 31. The noise suppression device of claim 25, wherein 

said processor determines which of the frequency sub-bands 
contain only noise, and removes by a larger proportion the 
frequency sub-bands containing only noise from the spec- 
trum. 

45 



